Deep Learning: A Simple Example
Contents
2. Deep Learning: A Simple Example¶
Let’s get back to the Name Gender Classifier.

2.1. Prepare Data¶
## Packages Dependencies
import os
import shutil
import numpy as np
import nltk
from nltk.corpus import names
import random
from sklearn.model_selection import train_test_split
from sklearn.manifold import TSNE
import seaborn as sns
import pandas as pd
import matplotlib.pyplot as plt
import matplotlib
matplotlib.rcParams['figure.dpi'] = 150
from lime.lime_text import LimeTextExplainer
import tensorflow
import tensorflow.keras as keras
from tensorflow.keras.preprocessing.text import Tokenizer
from tensorflow.keras.preprocessing import sequence
from tensorflow.keras.utils import to_categorical, plot_model
from tensorflow.keras.models import Sequential
from tensorflow.keras import layers
# from keras.layers import Dense
# from keras.layers import LSTM, RNN, GRU
# from keras.layers import Embedding
# from keras.layers import SpatialDropout1D
import kerastuner
labeled_names = ([(name, 1) for name in names.words('male.txt')] +
[(name, 0) for name in names.words('female.txt')])
random.shuffle(labeled_names)
2.2. Train-Test Split¶
train_set, test_set = train_test_split(labeled_names,
test_size=0.2,
random_state=42)
print(len(train_set), len(test_set))
6355 1589
names = [n for (n, l) in train_set] ## X
labels = [l for (n, l) in train_set] ## y
len(names)
6355
2.3. Tokenizer¶
keras.preprocessing.text.Tokenizeris a very useful tokenizer for text processing in deep learning.Tokenizerassumes that the word tokens of the input texts have been delimited by whitespaces.Tokenizerprovides the following functions:It will first create a dictionary for the entire corpus (a mapping of each word token and its unique integer index index) (
Tokenizer.fit_on_text())It can then use the corpus dictionary to convert words in each corpus text into integer sequences (
Tokenizer.texts_to_sequences())The dictionary is in
Tokenizer.word_index
Notes on
Tokenizer:By default, the token index 0 is reserved for padding token.
If
oov_tokenis specified, it is default to index 1. (Default:oov_token=False)Specify
num_words=NforTokenizerto include only top N words in converting texts to sequences.Tokenizerwill automatically remove punctuations.Tokenizeruse the whitespace as word-token delimiter.If every character is treated as a token, specify
char_level=True.Please read
Tokenizerdocumentation very carefully. Very important!!
tokenizer = Tokenizer(char_level=True)
tokenizer.fit_on_texts(names) ## similar to CountVectorizer.fit_transform()
Vocabulary¶
After we Tokenizer.fit_on_texts(), we can take a look at the corpus dictionary, i.e., the mapping of words and their unique integer indices.
# determine the vocabulary size
vocab_size = len(tokenizer.word_index) + 1
print('Vocabulary Size: %d' % vocab_size)
Vocabulary Size: 29
tokenizer.word_index
{'a': 1,
'e': 2,
'i': 3,
'n': 4,
'r': 5,
'l': 6,
'o': 7,
't': 8,
's': 9,
'd': 10,
'm': 11,
'y': 12,
'h': 13,
'c': 14,
'b': 15,
'u': 16,
'g': 17,
'k': 18,
'j': 19,
'f': 20,
'v': 21,
'p': 22,
'w': 23,
'z': 24,
'x': 25,
'q': 26,
'-': 27,
' ': 28}
2.4. Prepare Input and Output Tensors¶
Like in feature-based machine learning, a computational model only accepts numeric values. It is necessary to convert raw texts to numeric tensors for neural network.
After we create the
Tokenizer, we use theTokenizerto perform text vectorization, i.e., converting texts into tensors.In deep learning, words or characters are automatically converted into numeric representations. In other words, the feature engineering step is fully automatic.
Two Ways of Text Vectorization¶
Texts to Sequences:
For each text, we convert all word tokens into integer sequences.
These integer sequences will then be transformed into embeddings in the deep learning network.
These embeddings are usually the basis for deep learning sequence models (i.e., RNN).
Texts to Matrix:
For each text, we vectorize the entire text into a vector of bag-of-words representation.
A One-hot encoding of the entire text would have all values of the text vector to be either 0 or 1, indicating the occurrence of all the words in the dictionary.
We can of course use the frequency-based bag-of-words representation (similar to
CountVectorizer()).
2.5. Method 1: Text to Sequences¶
From Texts and Sequences¶
We can convert our corpus texts into integer sequences using
Tokenizer.texts_to_sequences().Because texts vary in lengths, we use
keras.preprocessing.sequence.pad_sequence()to pad all texts into a uniform length.This step is important. This ensures that every text, when pushed into the network, has exactly the same tensor shape.
names_ints = tokenizer.texts_to_sequences(names)
print(names[:5])
print(names_ints[:5])
print(labels[:5])
['Lenee', 'Vivyan', 'Greta', 'Jerrold', 'Frankie']
[[6, 2, 4, 2, 2], [21, 3, 21, 12, 1, 4], [17, 5, 2, 8, 1], [19, 2, 5, 5, 7, 6, 10], [20, 5, 1, 4, 18, 3, 2]]
[0, 0, 0, 1, 0]
Padding¶
When padding the all texts into uniform lengths, consider whether to pad or remove values from the beginning of the sequence (i.e.,
pre) or the other way (post).Check
paddingandtruncatingparameters inpad_sequencesIn this tutorial, we first identify the longest name, and use its length as the
max_lenand pad all names into themax_len.
## We can check the length distribution of texts in corpus
names_lens = [len(n) for n in names_ints]
names_lens
sns.displot(names_lens)
print(names[np.argmax(names_lens)]) # longest name
Helen-Elizabeth
max_len = names_lens[np.argmax(names_lens)]
max_len
15
names_ints_pad = sequence.pad_sequences(names_ints, maxlen=max_len)
names_ints_pad[:10]
array([[ 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 6, 2, 4, 2, 2],
[ 0, 0, 0, 0, 0, 0, 0, 0, 0, 21, 3, 21, 12, 1, 4],
[ 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 17, 5, 2, 8, 1],
[ 0, 0, 0, 0, 0, 0, 0, 0, 19, 2, 5, 5, 7, 6, 10],
[ 0, 0, 0, 0, 0, 0, 0, 0, 20, 5, 1, 4, 18, 3, 2],
[ 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 2, 11, 3, 6, 12],
[ 0, 0, 0, 0, 0, 0, 0, 0, 6, 7, 5, 2, 4, 24, 7],
[ 0, 0, 0, 0, 0, 0, 0, 0, 0, 20, 5, 2, 10, 5, 1],
[ 0, 0, 0, 0, 0, 14, 7, 4, 9, 8, 1, 4, 14, 3, 1],
[ 0, 0, 0, 0, 0, 0, 0, 0, 0, 14, 1, 5, 7, 6, 2]],
dtype=int32)
Define X and y¶
So for names, we convert names into integer sequences, and pad them into the uniform length.
We perform exactly the same processing to the names in testing data.
We convert both the names (X) and labels (y) into
numpy.array.
## training data
X_train = np.array(names_ints_pad).astype('int32')
y_train = np.array(labels)
## testing data
X_test_texts = [n for (n, l) in test_set]
X_test = np.array(
sequence.pad_sequences(tokenizer.texts_to_sequences(X_test_texts),
maxlen=max_len)).astype('int32')
y_test = np.array([l for (n, l) in test_set])
print(X_train.shape)
print(y_train.shape)
print(X_test.shape)
print(y_test.shape)
(6355, 15)
(6355,)
(1589, 15)
(1589,)
2.6. Method 2: Text to Matrix¶
One-Hot Encoding (Bag-of-Words)¶
We can convert each text to a bag-of-words vector using
Tokenzier.texts_to_matrix().In particular, we can specify the parameter
mode:binary,count, ortfidf.When the
mode="binary", the text vector is a one-hot encoding vector, indicating whether a character occurs in the text or not.
names_matrix = tokenizer.texts_to_matrix(names, mode="binary")
print(names_matrix.shape)
(6355, 29)
print(names[2])
print(names_matrix[2,:])
Greta
[0. 1. 1. 0. 0. 1. 0. 0. 1. 0. 0. 0. 0. 0. 0. 0. 0. 1. 0. 0. 0. 0. 0. 0.
0. 0. 0. 0. 0.]
tokenizer.word_index
{'a': 1,
'e': 2,
'i': 3,
'n': 4,
'r': 5,
'l': 6,
'o': 7,
't': 8,
's': 9,
'd': 10,
'm': 11,
'y': 12,
'h': 13,
'c': 14,
'b': 15,
'u': 16,
'g': 17,
'k': 18,
'j': 19,
'f': 20,
'v': 21,
'p': 22,
'w': 23,
'z': 24,
'x': 25,
'q': 26,
'-': 27,
' ': 28}
Define X and Y¶
X_train2 = np.array(names_matrix).astype('int32')
y_train2 = np.array(labels)
X_test2 = tokenizer.texts_to_matrix(X_test_texts,
mode="binary").astype('int32')
y_test2 = np.array([l for (n, l) in test_set])
print(X_train2.shape)
print(y_train2.shape)
print(X_test2.shape)
print(y_test2.shape)
(6355, 29)
(6355,)
(1589, 29)
(1589,)
2.7. Model Definition¶
Three important steps for building a deep neural network:
Define the model structure
Compile the model
Fit the model
After we have defined our input and output tensors (X and y), we can define the architecture of our neural network model.
For the two ways of name vectorized representations, we try two different types of networks.
Text to Matrix: Fully connected Dense Layers
Text to Sequences: Embedding + RNN
# Two Versions of Plotting Functions for `history` from `model.fit()`
def plot1(history):
acc = history.history['accuracy']
val_acc = history.history['val_accuracy']
loss = history.history['loss']
val_loss = history.history['val_loss']
epochs = range(1, len(acc) + 1)
## Accuracy plot
plt.plot(epochs, acc, 'bo', label='Training acc')
plt.plot(epochs, val_acc, 'b', label='Validation acc')
plt.title('Training and validation accuracy')
plt.legend()
## Loss plot
plt.figure()
plt.plot(epochs, loss, 'bo', label='Training loss')
plt.plot(epochs, val_loss, 'b', label='Validation loss')
plt.title('Training and validation loss')
plt.legend()
plt.show()
def plot2(history):
pd.DataFrame(history.history).plot(figsize=(8, 5))
plt.grid(True)
#plt.gca().set_ylim(0,1)
plt.show()
Model 1: Fully Connected Dense Layers¶
Let’s try a simple neural network with two fully-connected dense layers with the Text-to-Matrix inputs.
That is, the input of this model is the bag-of-words representation of the entire name.

Dense Layer Operation¶
The transformation of each Dense layer will transform the input tensor into a tensor whose dimension size is the same as the node number of the Dense layer.

## Define Model
model1 = keras.Sequential()
model1.add(keras.Input(shape=(vocab_size, ), name="one_hot_input"))
model1.add(layers.Dense(16, activation="relu", name="dense_layer_1"))
model1.add(layers.Dense(16, activation="relu", name="dense_layer_2"))
model1.add(layers.Dense(1, activation="sigmoid", name="output"))
## Compile Model
model1.compile(loss='binary_crossentropy',
optimizer='adam',
metrics=["accuracy"])
plot_model(model1, show_shapes=True)
A few hyperparameters for network training¶
Batch Size: The number of inputs needed per update of the model parameter (gradient descent)
Epoch: How many iterations needed for training
Validation Split Ratio: Proportion of validation and training data split
## Hyperparameters
BATCH_SIZE = 128
EPOCHS = 20
VALIDATION_SPLIT = 0.2
## Fit the model
history1 = model1.fit(X_train2,
y_train2,
batch_size=BATCH_SIZE,
epochs=EPOCHS,
verbose=2,
validation_split=VALIDATION_SPLIT)
Epoch 1/20
40/40 - 3s - loss: 0.6776 - accuracy: 0.5738 - val_loss: 0.6645 - val_accuracy: 0.6058
Epoch 2/20
40/40 - 0s - loss: 0.6427 - accuracy: 0.6432 - val_loss: 0.6489 - val_accuracy: 0.6216
Epoch 3/20
40/40 - 0s - loss: 0.6235 - accuracy: 0.6576 - val_loss: 0.6355 - val_accuracy: 0.6294
Epoch 4/20
40/40 - 0s - loss: 0.6068 - accuracy: 0.6755 - val_loss: 0.6257 - val_accuracy: 0.6381
Epoch 5/20
40/40 - 0s - loss: 0.5922 - accuracy: 0.6920 - val_loss: 0.6208 - val_accuracy: 0.6499
Epoch 6/20
40/40 - 0s - loss: 0.5816 - accuracy: 0.7022 - val_loss: 0.6129 - val_accuracy: 0.6648
Epoch 7/20
40/40 - 0s - loss: 0.5732 - accuracy: 0.7122 - val_loss: 0.6105 - val_accuracy: 0.6648
Epoch 8/20
40/40 - 0s - loss: 0.5674 - accuracy: 0.7166 - val_loss: 0.6079 - val_accuracy: 0.6727
Epoch 9/20
40/40 - 0s - loss: 0.5631 - accuracy: 0.7160 - val_loss: 0.6075 - val_accuracy: 0.6727
Epoch 10/20
40/40 - 0s - loss: 0.5583 - accuracy: 0.7223 - val_loss: 0.6067 - val_accuracy: 0.6727
Epoch 11/20
40/40 - 0s - loss: 0.5555 - accuracy: 0.7246 - val_loss: 0.6036 - val_accuracy: 0.6751
Epoch 12/20
40/40 - 0s - loss: 0.5527 - accuracy: 0.7256 - val_loss: 0.6012 - val_accuracy: 0.6751
Epoch 13/20
40/40 - 0s - loss: 0.5508 - accuracy: 0.7258 - val_loss: 0.6000 - val_accuracy: 0.6727
Epoch 14/20
40/40 - 0s - loss: 0.5484 - accuracy: 0.7264 - val_loss: 0.5992 - val_accuracy: 0.6814
Epoch 15/20
40/40 - 0s - loss: 0.5471 - accuracy: 0.7254 - val_loss: 0.5999 - val_accuracy: 0.6829
Epoch 16/20
40/40 - 0s - loss: 0.5455 - accuracy: 0.7282 - val_loss: 0.6017 - val_accuracy: 0.6806
Epoch 17/20
40/40 - 0s - loss: 0.5433 - accuracy: 0.7323 - val_loss: 0.6037 - val_accuracy: 0.6814
Epoch 18/20
40/40 - 0s - loss: 0.5426 - accuracy: 0.7295 - val_loss: 0.5996 - val_accuracy: 0.6814
Epoch 19/20
40/40 - 0s - loss: 0.5411 - accuracy: 0.7317 - val_loss: 0.5986 - val_accuracy: 0.6814
Epoch 20/20
40/40 - 0s - loss: 0.5398 - accuracy: 0.7352 - val_loss: 0.5991 - val_accuracy: 0.6790
plot1(history1)
model1.evaluate(X_test2, y_test2, batch_size=BATCH_SIZE, verbose=2)
13/13 - 0s - loss: 0.5743 - accuracy: 0.7017
[0.5742583870887756, 0.7016991972923279]
Model 2: Embedding + RNN¶
Another possibility is to introduce an embedding layer in the network, which transforms each character of the name into a tensor (i.e., embeddings), and then we add a Recurrent Neural Network (RNN) layer to process each character sequentially.
The strength of the RNN is that it iterates over the timesteps of a sequence, while maintaining an internal state that encodes information about the timesteps it has seen so far.
It is posited that after the RNN iterates through the entire sequence, it keeps important information of all previously iterated tokens for further operation.
The input of this network is a padded sequence of the original text (name).

## Define the embedding dimension
EMBEDDING_DIM = 128
## Define model
model2 = Sequential()
model2.add(
layers.Embedding(input_dim=vocab_size,
output_dim=EMBEDDING_DIM,
input_length=max_len,
mask_zero=True))
model2.add(layers.SimpleRNN(16, activation="relu", name="RNN_layer"))
model2.add(layers.Dense(16, activation="relu", name="dense_layer"))
model2.add(layers.Dense(1, activation="sigmoid", name="output"))
model2.compile(loss='binary_crossentropy',
optimizer='adam',
metrics=["accuracy"])
Embedding Layer Operation¶

RNN Layer Operation¶

RNN Layer Operation¶

Unrolled Version of RNN Operation¶

Unrolled Version of RNN Operation¶

plot_model(model2, show_shapes=True)
history2 = model2.fit(X_train,
y_train,
batch_size=BATCH_SIZE,
epochs=EPOCHS,
verbose=2,
validation_split=VALIDATION_SPLIT)
Epoch 1/20
40/40 - 2s - loss: 0.5768 - accuracy: 0.7191 - val_loss: 0.5161 - val_accuracy: 0.7215
Epoch 2/20
40/40 - 0s - loss: 0.4477 - accuracy: 0.7838 - val_loss: 0.4581 - val_accuracy: 0.7726
Epoch 3/20
40/40 - 0s - loss: 0.4124 - accuracy: 0.8021 - val_loss: 0.4471 - val_accuracy: 0.7797
Epoch 4/20
40/40 - 0s - loss: 0.4004 - accuracy: 0.8059 - val_loss: 0.4421 - val_accuracy: 0.7836
Epoch 5/20
40/40 - 0s - loss: 0.3941 - accuracy: 0.8141 - val_loss: 0.4390 - val_accuracy: 0.7868
Epoch 6/20
40/40 - 0s - loss: 0.3875 - accuracy: 0.8204 - val_loss: 0.4429 - val_accuracy: 0.7773
Epoch 7/20
40/40 - 0s - loss: 0.3840 - accuracy: 0.8192 - val_loss: 0.4546 - val_accuracy: 0.7836
Epoch 8/20
40/40 - 0s - loss: 0.3796 - accuracy: 0.8224 - val_loss: 0.4440 - val_accuracy: 0.7750
Epoch 9/20
40/40 - 0s - loss: 0.3762 - accuracy: 0.8257 - val_loss: 0.4449 - val_accuracy: 0.7805
Epoch 10/20
40/40 - 0s - loss: 0.3737 - accuracy: 0.8259 - val_loss: 0.4398 - val_accuracy: 0.7860
Epoch 11/20
40/40 - 0s - loss: 0.3692 - accuracy: 0.8304 - val_loss: 0.4329 - val_accuracy: 0.7923
Epoch 12/20
40/40 - 0s - loss: 0.3660 - accuracy: 0.8289 - val_loss: 0.4336 - val_accuracy: 0.7923
Epoch 13/20
40/40 - 0s - loss: 0.3632 - accuracy: 0.8289 - val_loss: 0.4504 - val_accuracy: 0.7860
Epoch 14/20
40/40 - 0s - loss: 0.3607 - accuracy: 0.8316 - val_loss: 0.4381 - val_accuracy: 0.7891
Epoch 15/20
40/40 - 0s - loss: 0.3575 - accuracy: 0.8362 - val_loss: 0.4382 - val_accuracy: 0.7868
Epoch 16/20
40/40 - 0s - loss: 0.3563 - accuracy: 0.8328 - val_loss: 0.4495 - val_accuracy: 0.7899
Epoch 17/20
40/40 - 0s - loss: 0.3566 - accuracy: 0.8342 - val_loss: 0.4358 - val_accuracy: 0.7931
Epoch 18/20
40/40 - 0s - loss: 0.3518 - accuracy: 0.8397 - val_loss: 0.4363 - val_accuracy: 0.7946
Epoch 19/20
40/40 - 0s - loss: 0.3492 - accuracy: 0.8383 - val_loss: 0.4322 - val_accuracy: 0.7962
Epoch 20/20
40/40 - 0s - loss: 0.3453 - accuracy: 0.8434 - val_loss: 0.4415 - val_accuracy: 0.7907
plot1(history2)
model2.evaluate(X_test, y_test, batch_size=BATCH_SIZE, verbose=2)
13/13 - 0s - loss: 0.4286 - accuracy: 0.7992
[0.42862406373023987, 0.7992448210716248]
Model 3: Regularization and Dropout¶
Based on the validation results of the previous two models (esp. the RNN-based model), we can see that they are probably a bit overfit because the model performance on the validation set starts to stall after the first few epochs.
We can add regularization and dropouts in our network definition to avoid overfitting.
## Define embedding dimension
EMBEDDING_DIM = 128
## Define model
model3 = Sequential()
model3.add(
layers.Embedding(input_dim=vocab_size,
output_dim=EMBEDDING_DIM,
input_length=max_len,
mask_zero=True))
model3.add(
layers.SimpleRNN(16,
activation="relu",
name="RNN_layer",
dropout=0.2, ## dropout for input character
recurrent_dropout=0.2)) ## dropout for previous state
model3.add(layers.Dense(16, activation="relu", name="dense_layer"))
model3.add(layers.Dense(1, activation="sigmoid", name="output"))
model3.compile(loss='binary_crossentropy',
optimizer='adam',
metrics=["accuracy"])
plot_model(model3, show_shapes=True)
history3 = model3.fit(X_train,
y_train,
batch_size=BATCH_SIZE,
epochs=EPOCHS,
verbose=2,
validation_split=VALIDATION_SPLIT)
Epoch 1/20
40/40 - 2s - loss: 0.6089 - accuracy: 0.6410 - val_loss: 0.5670 - val_accuracy: 0.6428
Epoch 2/20
40/40 - 0s - loss: 0.5130 - accuracy: 0.7299 - val_loss: 0.5072 - val_accuracy: 0.7522
Epoch 3/20
40/40 - 0s - loss: 0.4730 - accuracy: 0.7705 - val_loss: 0.4793 - val_accuracy: 0.7616
Epoch 4/20
40/40 - 0s - loss: 0.4528 - accuracy: 0.7905 - val_loss: 0.4637 - val_accuracy: 0.7648
Epoch 5/20
40/40 - 0s - loss: 0.4372 - accuracy: 0.7878 - val_loss: 0.4554 - val_accuracy: 0.7750
Epoch 6/20
40/40 - 0s - loss: 0.4304 - accuracy: 0.7931 - val_loss: 0.4490 - val_accuracy: 0.7797
Epoch 7/20
40/40 - 0s - loss: 0.4254 - accuracy: 0.7964 - val_loss: 0.4444 - val_accuracy: 0.7821
Epoch 8/20
40/40 - 0s - loss: 0.4222 - accuracy: 0.7976 - val_loss: 0.4464 - val_accuracy: 0.7758
Epoch 9/20
40/40 - 0s - loss: 0.4172 - accuracy: 0.8017 - val_loss: 0.4448 - val_accuracy: 0.7797
Epoch 10/20
40/40 - 0s - loss: 0.4154 - accuracy: 0.7964 - val_loss: 0.4480 - val_accuracy: 0.7821
Epoch 11/20
40/40 - 0s - loss: 0.4133 - accuracy: 0.7988 - val_loss: 0.4450 - val_accuracy: 0.7821
Epoch 12/20
40/40 - 0s - loss: 0.4075 - accuracy: 0.8063 - val_loss: 0.4411 - val_accuracy: 0.7821
Epoch 13/20
40/40 - 0s - loss: 0.4042 - accuracy: 0.8090 - val_loss: 0.4429 - val_accuracy: 0.7860
Epoch 14/20
40/40 - 0s - loss: 0.4104 - accuracy: 0.8057 - val_loss: 0.4391 - val_accuracy: 0.7899
Epoch 15/20
40/40 - 0s - loss: 0.4097 - accuracy: 0.8017 - val_loss: 0.4363 - val_accuracy: 0.7891
Epoch 16/20
40/40 - 0s - loss: 0.4039 - accuracy: 0.8027 - val_loss: 0.4375 - val_accuracy: 0.7813
Epoch 17/20
40/40 - 0s - loss: 0.4057 - accuracy: 0.8041 - val_loss: 0.4358 - val_accuracy: 0.7876
Epoch 18/20
40/40 - 0s - loss: 0.4051 - accuracy: 0.8043 - val_loss: 0.4341 - val_accuracy: 0.7884
Epoch 19/20
40/40 - 0s - loss: 0.4018 - accuracy: 0.8059 - val_loss: 0.4339 - val_accuracy: 0.7899
Epoch 20/20
40/40 - 0s - loss: 0.4027 - accuracy: 0.8000 - val_loss: 0.4334 - val_accuracy: 0.7860
plot1(history3)
model3.evaluate(X_test, y_test, batch_size=BATCH_SIZE, verbose=2)
13/13 - 0s - loss: 0.4311 - accuracy: 0.7885
[0.4311422109603882, 0.7885462641716003]
Model 4: Improve the Models¶
In addition to regularization and dropouts, we can further improve the model by increasing the model complexity.
In particular, we can increase the depths and widths of the network layers.
Let’s try stacking two RNN layers.
Tip
When we stack two sequence layers (e.g., RNN), we need to make sure that the hidden states (outputs) of the first sequence layer at all timesteps are properly passed onto the next sequence layer, not just the hidden state (output) of the last timestep.
In keras, this usually means that we need to set the argument return_sequences=True in a sequence layer (e.g., SimpleRNN, LSTM, GRU etc).
## Define embedding dimension
MBEDDING_DIM = 128
## Define model
model4 = Sequential()
model4.add(
layers.Embedding(input_dim=vocab_size,
output_dim=EMBEDDING_DIM,
input_length=max_len,
mask_zero=True))
model4.add(
layers.SimpleRNN(16,
activation="relu",
name="RNN_layer_1",
dropout=0.2,
recurrent_dropout=0.2,
return_sequences=True)
) ## To ensure the hidden states of all timesteps are pased down to next layer
model4.add(
layers.SimpleRNN(16,
activation="relu",
name="RNN_layer_2",
dropout=0.2,
recurrent_dropout=0.2))
model4.add(layers.Dense(1, activation="sigmoid", name="output"))
## Compile model
model4.compile(loss='binary_crossentropy',
optimizer='adam',
metrics=["accuracy"])
plot_model(model4, show_shapes=True)
history4 = model4.fit(X_train,
y_train,
batch_size=BATCH_SIZE,
epochs=EPOCHS,
verbose=2,
validation_split=VALIDATION_SPLIT)
Epoch 1/20
40/40 - 4s - loss: 0.6304 - accuracy: 0.6391 - val_loss: 0.6163 - val_accuracy: 0.6058
Epoch 2/20
40/40 - 1s - loss: 0.5742 - accuracy: 0.6603 - val_loss: 0.5411 - val_accuracy: 0.7175
Epoch 3/20
40/40 - 0s - loss: 0.5082 - accuracy: 0.7411 - val_loss: 0.4776 - val_accuracy: 0.7530
Epoch 4/20
40/40 - 0s - loss: 0.4679 - accuracy: 0.7762 - val_loss: 0.4597 - val_accuracy: 0.7624
Epoch 5/20
40/40 - 0s - loss: 0.4513 - accuracy: 0.7815 - val_loss: 0.4489 - val_accuracy: 0.7687
Epoch 6/20
40/40 - 1s - loss: 0.4411 - accuracy: 0.7905 - val_loss: 0.4474 - val_accuracy: 0.7750
Epoch 7/20
40/40 - 1s - loss: 0.4377 - accuracy: 0.7939 - val_loss: 0.4459 - val_accuracy: 0.7766
Epoch 8/20
40/40 - 0s - loss: 0.4340 - accuracy: 0.7899 - val_loss: 0.4443 - val_accuracy: 0.7836
Epoch 9/20
40/40 - 0s - loss: 0.4283 - accuracy: 0.7978 - val_loss: 0.4396 - val_accuracy: 0.7821
Epoch 10/20
40/40 - 0s - loss: 0.4304 - accuracy: 0.7996 - val_loss: 0.4415 - val_accuracy: 0.7828
Epoch 11/20
40/40 - 1s - loss: 0.4288 - accuracy: 0.8007 - val_loss: 0.4380 - val_accuracy: 0.7836
Epoch 12/20
40/40 - 1s - loss: 0.4229 - accuracy: 0.7978 - val_loss: 0.4340 - val_accuracy: 0.7891
Epoch 13/20
40/40 - 0s - loss: 0.4237 - accuracy: 0.7980 - val_loss: 0.4339 - val_accuracy: 0.7821
Epoch 14/20
40/40 - 0s - loss: 0.4215 - accuracy: 0.8031 - val_loss: 0.4386 - val_accuracy: 0.7758
Epoch 15/20
40/40 - 1s - loss: 0.4249 - accuracy: 0.7974 - val_loss: 0.4357 - val_accuracy: 0.7828
Epoch 16/20
40/40 - 0s - loss: 0.4189 - accuracy: 0.7954 - val_loss: 0.4346 - val_accuracy: 0.7821
Epoch 17/20
40/40 - 0s - loss: 0.4224 - accuracy: 0.8037 - val_loss: 0.4337 - val_accuracy: 0.7868
Epoch 18/20
40/40 - 0s - loss: 0.4161 - accuracy: 0.8045 - val_loss: 0.4346 - val_accuracy: 0.7876
Epoch 19/20
40/40 - 0s - loss: 0.4113 - accuracy: 0.8065 - val_loss: 0.4323 - val_accuracy: 0.7860
Epoch 20/20
40/40 - 0s - loss: 0.4150 - accuracy: 0.8029 - val_loss: 0.4384 - val_accuracy: 0.7844
plot1(history4)
model4.evaluate(X_test, y_test, batch_size=BATCH_SIZE, verbose=2)
13/13 - 0s - loss: 0.4374 - accuracy: 0.7778
[0.4374392628669739, 0.7778477072715759]
Model 5: Bidirectional¶
We can also increase the model complexity in at least two possible ways:
Use more advanced RNNs, such as LSTM or GRU
Process the sequence in two directions
Increase the hidden nodes of the RNN/LSTM
Now let’s try the more sophisticated RNN, LSTM, and with bidirectional sequence processing and add more nodes to the LSTM layer.
## Define embedding dimension
EMBEDDING_DIM = 128
## Define model
model5 = Sequential()
model5.add(
layers.Embedding(input_dim=vocab_size,
output_dim=EMBEDDING_DIM,
input_length=max_len,
mask_zero=True))
model5.add(
layers.Bidirectional( ## Bidirectional sequence processing
layers.LSTM(32,
activation="relu",
name="lstm_layer_1",
dropout=0.2,
recurrent_dropout=0.5,
return_sequences=True)))
model5.add(
layers.Bidirectional( ## Bidirectional sequence processing
layers.LSTM(32,
activation="relu",
name="lstm_layer_2",
dropout=0.2,
recurrent_dropout=0.5)))
model5.add(layers.Dense(1, activation="sigmoid", name="output"))
model5.compile(loss='binary_crossentropy',
optimizer='adam',
metrics=["accuracy"])
plot_model(model5, show_shapes=True)
history5 = model5.fit(X_train,
y_train,
batch_size=BATCH_SIZE,
epochs=EPOCHS,
verbose=2,
validation_split=VALIDATION_SPLIT)
Epoch 1/20
40/40 - 15s - loss: 0.6570 - accuracy: 0.6373 - val_loss: 0.6464 - val_accuracy: 0.6042
Epoch 2/20
40/40 - 2s - loss: 0.5868 - accuracy: 0.6613 - val_loss: 0.5780 - val_accuracy: 0.7152
Epoch 3/20
40/40 - 2s - loss: 0.4950 - accuracy: 0.7579 - val_loss: 0.5086 - val_accuracy: 0.7640
Epoch 4/20
40/40 - 2s - loss: 0.4715 - accuracy: 0.7787 - val_loss: 0.4863 - val_accuracy: 0.7766
Epoch 5/20
40/40 - 2s - loss: 0.4495 - accuracy: 0.7817 - val_loss: 0.4871 - val_accuracy: 0.7726
Epoch 6/20
40/40 - 2s - loss: 0.4385 - accuracy: 0.7937 - val_loss: 0.4519 - val_accuracy: 0.7821
Epoch 7/20
40/40 - 3s - loss: 0.4278 - accuracy: 0.7913 - val_loss: 0.4508 - val_accuracy: 0.7813
Epoch 8/20
40/40 - 3s - loss: 0.4252 - accuracy: 0.8011 - val_loss: 0.4540 - val_accuracy: 0.7836
Epoch 9/20
40/40 - 3s - loss: 0.4151 - accuracy: 0.8063 - val_loss: 0.4445 - val_accuracy: 0.7860
Epoch 10/20
40/40 - 3s - loss: 0.4095 - accuracy: 0.8072 - val_loss: 0.4354 - val_accuracy: 0.7805
Epoch 11/20
40/40 - 4s - loss: 0.4057 - accuracy: 0.8125 - val_loss: 0.4425 - val_accuracy: 0.7844
Epoch 12/20
40/40 - 2s - loss: 0.4031 - accuracy: 0.8110 - val_loss: 0.4395 - val_accuracy: 0.7821
Epoch 13/20
40/40 - 2s - loss: 0.4015 - accuracy: 0.8120 - val_loss: 0.4349 - val_accuracy: 0.7828
Epoch 14/20
40/40 - 2s - loss: 0.3945 - accuracy: 0.8179 - val_loss: 0.4391 - val_accuracy: 0.7884
Epoch 15/20
40/40 - 2s - loss: 0.3951 - accuracy: 0.8145 - val_loss: 0.4409 - val_accuracy: 0.7884
Epoch 16/20
40/40 - 3s - loss: 0.3906 - accuracy: 0.8173 - val_loss: 0.4334 - val_accuracy: 0.7907
Epoch 17/20
40/40 - 2s - loss: 0.3892 - accuracy: 0.8147 - val_loss: 0.4337 - val_accuracy: 0.7876
Epoch 18/20
40/40 - 2s - loss: 0.3875 - accuracy: 0.8179 - val_loss: 0.4328 - val_accuracy: 0.7868
Epoch 19/20
40/40 - 2s - loss: 0.3839 - accuracy: 0.8220 - val_loss: 0.4333 - val_accuracy: 0.7891
Epoch 20/20
40/40 - 2s - loss: 0.3827 - accuracy: 0.8175 - val_loss: 0.4295 - val_accuracy: 0.7923
plot1(history5)
model5.evaluate(X_test, y_test, batch_size=BATCH_SIZE, verbose=2)
13/13 - 0s - loss: 0.4135 - accuracy: 0.7961
[0.4135158360004425, 0.7960981726646423]
2.8. Check Embeddings¶
Compared to one-hot encodings of characters, embeddings may include more information relating to the characteristics (semantics?) of the characters.
We can extract the embedding layer and apply dimensional reduction techniques (i.e., TSNE) to see how embeddings capture the relationships in-between characters.
## A name in sequence from test set
print(X_test_texts[10])
print(X_test[10])
Betsey
[ 0 0 0 0 0 0 0 0 0 15 2 8 9 2 12]
## Extract Corpus Dictionary (mapping of chars and integer indices)
ind2char = tokenizer.index_word
[ind2char.get(i) for i in X_test[10] if ind2char.get(i) != None]
['b', 'e', 't', 's', 'e', 'y']
## Extract the embedding layer (its weights matrix)
char_vectors = model5.layers[0].get_weights()[0]
print(char_vectors.shape) ## embedding shape (vocab_size, embed_dim)
print(char_vectors[1,:]) ## first char embeddings
(29, 128)
[ 0.05686563 0.12215731 -0.06079946 0.0121353 -0.04616854 0.14015613
-0.09889542 0.07777242 -0.1028475 -0.06973263 0.09018171 0.02769772
0.0597048 0.05823955 -0.15591906 -0.01064866 0.03176391 0.04092677
-0.12599784 -0.05176904 0.11851884 0.08976518 -0.1295519 0.14801967
-0.15982433 0.07617053 0.03085207 -0.03518631 -0.06615796 0.15541968
0.03479173 0.10266954 0.05208779 0.08636085 -0.14817865 -0.02627853
0.06587387 0.05906583 0.0426388 0.13284813 -0.12247179 -0.10940979
0.06940527 -0.0205314 -0.12983604 -0.00052357 0.09427121 -0.13137868
0.06129911 0.04557774 -0.09302595 -0.00982916 0.13013926 -0.11932404
0.13151802 -0.10209155 -0.04563443 -0.15284444 0.1317115 -0.17499615
0.21118793 -0.09372313 -0.02719191 0.02869096 -0.12644686 0.07037038
0.08937194 0.15041372 0.02869375 -0.13002197 -0.07894815 0.10348763
-0.09154066 0.11610539 0.09726465 -0.12990089 -0.12233408 0.13111354
0.00609852 -0.09994514 0.13371396 -0.12472303 0.15120316 0.07099096
-0.17958435 -0.09086528 -0.15022661 -0.11646824 -0.10414851 -0.08767041
0.00112964 -0.10777231 -0.11847872 -0.11599579 -0.06148562 0.09696943
-0.14475663 -0.11240215 0.11003076 -0.08349662 -0.1374149 0.06062065
0.04149861 -0.10977595 -0.17328326 -0.15062124 0.11217364 0.03156615
-0.06679957 0.09271324 0.10324206 0.01428349 0.09207577 0.09801444
0.09336244 0.06451239 0.16503689 0.20143726 -0.10053363 0.05456796
0.1792414 0.16147721 0.06938347 0.14941241 0.07074399 0.07002545
-0.09743214 0.04573065]
labels = [char for (ind, char) in tokenizer.index_word.items()]
labels.insert(0, None)
labels
[None,
'a',
'e',
'i',
'n',
'r',
'l',
'o',
't',
's',
'd',
'm',
'y',
'h',
'c',
'b',
'u',
'g',
'k',
'j',
'f',
'v',
'p',
'w',
'z',
'x',
'q',
'-',
' ']
## Visulizing char embeddings via dimensional reduction techniques
tsne = TSNE(n_components=2, random_state=0, n_iter=5000, perplexity=3)
np.set_printoptions(suppress=True)
T = tsne.fit_transform(char_vectors)
labels = labels
plt.figure(figsize=(10, 7), dpi=150)
plt.scatter(T[:, 0], T[:, 1], c='orange', edgecolors='r')
for label, x, y in zip(labels, T[:, 0], T[:, 1]):
plt.annotate(label,
xy=(x + 1, y + 1),
xytext=(0, 0),
textcoords='offset points')
2.9. Issues of Word/Character Representations¶
One-hot encoding does not indicate semantic relationships between characters.
For deep learning NLP, it is preferred to convert one-hot encodings of words/characters into embeddings, which are argued to include more semantic information of the tokens.
Now the question is how to train and create better word embeddings.
There are at least two alternatives:
We can train the embeddings along with our current NLP task.
We can use pre-trained embeddings from other unsupervised learning (transfer learning).
We will come back to this issue later.
2.10. Hyperparameter Tuning¶
Note
Please install keras tuner module in your current conda:
pip install -U keras-tuner
or
conda install -c conda-forge keras-tuner
Like feature-based ML methods, neural networks also come with many hyperparameters, which require default values.
Typical hyperparameters include:
Number of nodes for the layer
Learning Rates
We can utilize the module,
keras-tuner, to fine-tune these hyperparameters (i.e., to find the values that optimize the model performance).
Steps for Keras Tuner
First, wrap the model definition in a function, which takes a single
hpargument.Inside this function, replace any value we want to tune with a call to hyperparameter sampling methods, e.g.
hp.Int()orhp.Choice(). The function should return a compiled model.Next, instantiate a
tunerobject specifying our optimization objective and other search parameters.Finally, start the search with the
search()method, which takes the same arguments asModel.fit()in keras.When the search is over, we can retrieve the best model and a summary of the results from the
tunner.
## confirm if the right kernel is being used
# import sys
# sys.executable
## Wrap model definition in a function
## and specify the parameters needed for tuning
# def build_model(hp):
# model1 = keras.Sequential()
# model1.add(keras.Input(shape=(max_len,)))
# model1.add(layers.Dense(hp.Int('units', min_value=32, max_value=128, step=32), activation="relu", name="dense_layer_1"))
# model1.add(layers.Dense(hp.Int('units', min_value=32, max_value=128, step=32), activation="relu", name="dense_layer_2"))
# model1.add(layers.Dense(2, activation="softmax", name="output"))
# model1.compile(
# optimizer=keras.optimizers.Adam(
# hp.Choice('learning_rate',
# values=[1e-2, 1e-3, 1e-4])),
# loss='sparse_categorical_crossentropy',
# metrics=['accuracy'])
# return model1
## wrap model definition and compiling
def build_model(hp):
m = Sequential()
m.add(
layers.Embedding(
input_dim=vocab_size,
output_dim=hp.Int(
'output_dim', ## tuning 2
min_value=32,
max_value=128,
step=32),
input_length=max_len,
mask_zero=True))
m.add(
layers.Bidirectional(
layers.LSTM(
hp.Int('units', min_value=16, max_value=64,
step=16), ## tuning 1
activation="relu",
dropout=0.2,
recurrent_dropout=0.2)))
m.add(layers.Dense(1, activation="sigmoid", name="output"))
m.compile(loss='binary_crossentropy',
optimizer='adam',
metrics=["accuracy"])
return m
## This is to clean up the temp dir from the tuner
## Every time we re-start the tunner, it's better to keep the temp dir clean
if os.path.isdir('my_dir'):
shutil.rmtree('my_dir')
The
max_trialsvariable represents the maximum number of the hyperparameter combinations that will be tested by the tuner (because sometimes the combinations are too many and the max number can limit the time in fine-tuning).The
execution_per_trialvariable is the number of models that should be built and fit for each trial for robustness purposes (i.e., consistent results).
## Instantiate the tunner
tuner = kerastuner.tuners.RandomSearch(build_model,
objective='val_accuracy',
max_trials=10,
executions_per_trial=2,
directory='my_dir')
## Check the tuner's search space
tuner.search_space_summary()
Search space summary
output_dim (Int)
units (Int)
%%time
## Start tuning with the tuner
tuner.search(X_train, y_train, validation_split=VALIDATION_SPLIT, batch_size=BATCH_SIZE)
40/40 [==============================] - ETA: 3:43 - loss: 0.6950 - accuracy: 0.39 - ETA: 1s - loss: 0.6942 - accuracy: 0.4410 - ETA: 0s - loss: 0.6932 - accuracy: 0.49 - ETA: 0s - loss: 0.6925 - accuracy: 0.51 - ETA: 0s - loss: 0.6918 - accuracy: 0.53 - ETA: 0s - loss: 0.6911 - accuracy: 0.54 - ETA: 0s - loss: 0.6904 - accuracy: 0.55 - ETA: 0s - loss: 0.6897 - accuracy: 0.55 - ETA: 0s - loss: 0.6889 - accuracy: 0.56 - ETA: 0s - loss: 0.6882 - accuracy: 0.57 - ETA: 0s - loss: 0.6875 - accuracy: 0.57 - ETA: 0s - loss: 0.6869 - accuracy: 0.57 - ETA: 0s - loss: 0.6861 - accuracy: 0.58 - ETA: 0s - loss: 0.6857 - accuracy: 0.58 - ETA: 0s - loss: 0.6849 - accuracy: 0.58 - ETA: 0s - loss: 0.6841 - accuracy: 0.58 - ETA: 0s - loss: 0.6832 - accuracy: 0.59 - ETA: 0s - loss: 0.6824 - accuracy: 0.59 - ETA: 0s - loss: 0.6815 - accuracy: 0.59 - ETA: 0s - loss: 0.6807 - accuracy: 0.59 - ETA: 0s - loss: 0.6798 - accuracy: 0.59 - 8s 54ms/step - loss: 0.6793 - accuracy: 0.5993 - val_loss: 0.6495 - val_accuracy: 0.6042
40/40 [==============================] - ETA: 3:56 - loss: 0.6921 - accuracy: 0.58 - ETA: 1s - loss: 0.6918 - accuracy: 0.5725 - ETA: 1s - loss: 0.6910 - accuracy: 0.58 - ETA: 0s - loss: 0.6902 - accuracy: 0.59 - ETA: 0s - loss: 0.6888 - accuracy: 0.60 - ETA: 0s - loss: 0.6878 - accuracy: 0.60 - ETA: 0s - loss: 0.6868 - accuracy: 0.61 - ETA: 0s - loss: 0.6863 - accuracy: 0.61 - ETA: 0s - loss: 0.6854 - accuracy: 0.61 - ETA: 0s - loss: 0.6845 - accuracy: 0.61 - ETA: 0s - loss: 0.6836 - accuracy: 0.61 - ETA: 0s - loss: 0.6827 - accuracy: 0.61 - ETA: 0s - loss: 0.6817 - accuracy: 0.62 - ETA: 0s - loss: 0.6807 - accuracy: 0.62 - ETA: 0s - loss: 0.6799 - accuracy: 0.62 - ETA: 0s - loss: 0.6790 - accuracy: 0.62 - ETA: 0s - loss: 0.6786 - accuracy: 0.62 - ETA: 0s - loss: 0.6781 - accuracy: 0.62 - ETA: 0s - loss: 0.6770 - accuracy: 0.62 - ETA: 0s - loss: 0.6759 - accuracy: 0.62 - ETA: 0s - loss: 0.6747 - accuracy: 0.62 - 8s 53ms/step - loss: 0.6735 - accuracy: 0.6255 - val_loss: 0.6469 - val_accuracy: 0.6042
Trial complete
Trial summary
Hyperparameters:
40/40 [==============================] - ETA: 3:49 - loss: 0.6930 - accuracy: 0.49 - ETA: 1s - loss: 0.6917 - accuracy: 0.5525 - ETA: 2s - loss: 0.6912 - accuracy: 0.56 - ETA: 2s - loss: 0.6907 - accuracy: 0.57 - ETA: 2s - loss: 0.6901 - accuracy: 0.57 - ETA: 1s - loss: 0.6889 - accuracy: 0.58 - ETA: 1s - loss: 0.6883 - accuracy: 0.58 - ETA: 1s - loss: 0.6872 - accuracy: 0.59 - ETA: 1s - loss: 0.6859 - accuracy: 0.59 - ETA: 1s - loss: 0.6845 - accuracy: 0.60 - ETA: 1s - loss: 0.6828 - accuracy: 0.60 - ETA: 1s - loss: 0.6813 - accuracy: 0.60 - ETA: 0s - loss: 0.6799 - accuracy: 0.61 - ETA: 0s - loss: 0.6786 - accuracy: 0.61 - ETA: 0s - loss: 0.6773 - accuracy: 0.61 - ETA: 0s - loss: 0.6767 - accuracy: 0.61 - ETA: 0s - loss: 0.6754 - accuracy: 0.61 - ETA: 0s - loss: 0.6743 - accuracy: 0.61 - ETA: 0s - loss: 0.6737 - accuracy: 0.61 - ETA: 0s - loss: 0.6731 - accuracy: 0.62 - ETA: 0s - loss: 0.6725 - accuracy: 0.62 - ETA: 0s - loss: 0.6719 - accuracy: 0.62 - ETA: 0s - loss: 0.6708 - accuracy: 0.62 - ETA: 0s - loss: 0.6697 - accuracy: 0.62 - ETA: 0s - loss: 0.6687 - accuracy: 0.62 - 9s 67ms/step - loss: 0.6682 - accuracy: 0.6235 - val_loss: 0.6352 - val_accuracy: 0.6042
40/40 [==============================] - ETA: 3:49 - loss: 0.6956 - accuracy: 0.33 - ETA: 2s - loss: 0.6947 - accuracy: 0.4180 - ETA: 1s - loss: 0.6936 - accuracy: 0.47 - ETA: 1s - loss: 0.6931 - accuracy: 0.49 - ETA: 1s - loss: 0.6927 - accuracy: 0.50 - ETA: 1s - loss: 0.6922 - accuracy: 0.51 - ETA: 1s - loss: 0.6917 - accuracy: 0.52 - ETA: 1s - loss: 0.6908 - accuracy: 0.53 - ETA: 1s - loss: 0.6898 - accuracy: 0.54 - ETA: 1s - loss: 0.6889 - accuracy: 0.55 - ETA: 1s - loss: 0.6880 - accuracy: 0.55 - ETA: 0s - loss: 0.6869 - accuracy: 0.56 - ETA: 0s - loss: 0.6858 - accuracy: 0.56 - ETA: 0s - loss: 0.6847 - accuracy: 0.57 - ETA: 0s - loss: 0.6841 - accuracy: 0.57 - ETA: 0s - loss: 0.6828 - accuracy: 0.57 - ETA: 0s - loss: 0.6814 - accuracy: 0.58 - ETA: 0s - loss: 0.6800 - accuracy: 0.58 - ETA: 0s - loss: 0.6786 - accuracy: 0.58 - ETA: 0s - loss: 0.6774 - accuracy: 0.58 - ETA: 0s - loss: 0.6762 - accuracy: 0.59 - ETA: 0s - loss: 0.6750 - accuracy: 0.59 - ETA: 0s - loss: 0.6737 - accuracy: 0.59 - 8s 62ms/step - loss: 0.6724 - accuracy: 0.5962 - val_loss: 0.6320 - val_accuracy: 0.6050
Trial complete
Trial summary
Hyperparameters:
40/40 [==============================] - ETA: 4:29 - loss: 0.6911 - accuracy: 0.65 - ETA: 1s - loss: 0.6899 - accuracy: 0.6589 - ETA: 1s - loss: 0.6893 - accuracy: 0.64 - ETA: 1s - loss: 0.6886 - accuracy: 0.64 - ETA: 1s - loss: 0.6882 - accuracy: 0.64 - ETA: 1s - loss: 0.6877 - accuracy: 0.64 - ETA: 1s - loss: 0.6872 - accuracy: 0.64 - ETA: 1s - loss: 0.6866 - accuracy: 0.64 - ETA: 1s - loss: 0.6861 - accuracy: 0.64 - ETA: 1s - loss: 0.6851 - accuracy: 0.64 - ETA: 1s - loss: 0.6846 - accuracy: 0.64 - ETA: 1s - loss: 0.6841 - accuracy: 0.64 - ETA: 1s - loss: 0.6830 - accuracy: 0.64 - ETA: 1s - loss: 0.6820 - accuracy: 0.64 - ETA: 0s - loss: 0.6807 - accuracy: 0.64 - ETA: 0s - loss: 0.6800 - accuracy: 0.64 - ETA: 0s - loss: 0.6793 - accuracy: 0.64 - ETA: 0s - loss: 0.6786 - accuracy: 0.64 - ETA: 0s - loss: 0.6779 - accuracy: 0.64 - ETA: 0s - loss: 0.6766 - accuracy: 0.64 - ETA: 0s - loss: 0.6759 - accuracy: 0.64 - ETA: 0s - loss: 0.6745 - accuracy: 0.64 - ETA: 0s - loss: 0.6733 - accuracy: 0.64 - ETA: 0s - loss: 0.6721 - accuracy: 0.64 - ETA: 0s - loss: 0.6715 - accuracy: 0.64 - ETA: 0s - loss: 0.6704 - accuracy: 0.64 - ETA: 0s - loss: 0.6699 - accuracy: 0.64 - 11s 103ms/step - loss: 0.6690 - accuracy: 0.6424 - val_loss: 0.6358 - val_accuracy: 0.6042
40/40 [==============================] - ETA: 4:00 - loss: 0.6918 - accuracy: 0.62 - ETA: 1s - loss: 0.6907 - accuracy: 0.6354 - ETA: 1s - loss: 0.6896 - accuracy: 0.63 - ETA: 0s - loss: 0.6888 - accuracy: 0.63 - ETA: 0s - loss: 0.6878 - accuracy: 0.63 - ETA: 0s - loss: 0.6868 - accuracy: 0.63 - ETA: 0s - loss: 0.6856 - accuracy: 0.63 - ETA: 0s - loss: 0.6844 - accuracy: 0.63 - ETA: 0s - loss: 0.6831 - accuracy: 0.63 - ETA: 0s - loss: 0.6818 - accuracy: 0.63 - ETA: 0s - loss: 0.6806 - accuracy: 0.63 - ETA: 0s - loss: 0.6795 - accuracy: 0.63 - ETA: 0s - loss: 0.6783 - accuracy: 0.63 - ETA: 0s - loss: 0.6770 - accuracy: 0.63 - ETA: 0s - loss: 0.6758 - accuracy: 0.63 - ETA: 0s - loss: 0.6747 - accuracy: 0.63 - ETA: 0s - loss: 0.6736 - accuracy: 0.63 - ETA: 0s - loss: 0.6730 - accuracy: 0.63 - ETA: 0s - loss: 0.6719 - accuracy: 0.63 - ETA: 0s - loss: 0.6707 - accuracy: 0.63 - ETA: 0s - loss: 0.6696 - accuracy: 0.63 - 8s 55ms/step - loss: 0.6691 - accuracy: 0.6385 - val_loss: 0.6258 - val_accuracy: 0.6050
Trial complete
Trial summary
Hyperparameters:
40/40 [==============================] - ETA: 4:04 - loss: 0.6925 - accuracy: 0.56 - ETA: 0s - loss: 0.6919 - accuracy: 0.5846 - ETA: 0s - loss: 0.6911 - accuracy: 0.60 - ETA: 0s - loss: 0.6903 - accuracy: 0.61 - ETA: 0s - loss: 0.6898 - accuracy: 0.61 - ETA: 0s - loss: 0.6892 - accuracy: 0.62 - ETA: 0s - loss: 0.6887 - accuracy: 0.62 - ETA: 0s - loss: 0.6882 - accuracy: 0.62 - ETA: 0s - loss: 0.6878 - accuracy: 0.62 - ETA: 0s - loss: 0.6872 - accuracy: 0.62 - ETA: 0s - loss: 0.6867 - accuracy: 0.62 - ETA: 0s - loss: 0.6860 - accuracy: 0.63 - ETA: 0s - loss: 0.6853 - accuracy: 0.63 - ETA: 0s - loss: 0.6847 - accuracy: 0.63 - ETA: 0s - loss: 0.6840 - accuracy: 0.63 - ETA: 0s - loss: 0.6835 - accuracy: 0.63 - 8s 45ms/step - loss: 0.6831 - accuracy: 0.6330 - val_loss: 0.6617 - val_accuracy: 0.6042
40/40 [==============================] - ETA: 3:42 - loss: 0.6947 - accuracy: 0.46 - ETA: 0s - loss: 0.6940 - accuracy: 0.4818 - ETA: 0s - loss: 0.6931 - accuracy: 0.52 - ETA: 0s - loss: 0.6923 - accuracy: 0.54 - ETA: 0s - loss: 0.6916 - accuracy: 0.55 - ETA: 0s - loss: 0.6911 - accuracy: 0.56 - ETA: 0s - loss: 0.6906 - accuracy: 0.57 - ETA: 0s - loss: 0.6898 - accuracy: 0.57 - ETA: 0s - loss: 0.6892 - accuracy: 0.58 - ETA: 0s - loss: 0.6887 - accuracy: 0.58 - ETA: 0s - loss: 0.6882 - accuracy: 0.59 - ETA: 0s - loss: 0.6877 - accuracy: 0.59 - ETA: 0s - loss: 0.6871 - accuracy: 0.59 - ETA: 0s - loss: 0.6865 - accuracy: 0.59 - ETA: 0s - loss: 0.6862 - accuracy: 0.60 - ETA: 0s - loss: 0.6856 - accuracy: 0.60 - ETA: 0s - loss: 0.6850 - accuracy: 0.60 - ETA: 0s - loss: 0.6845 - accuracy: 0.60 - ETA: 0s - loss: 0.6840 - accuracy: 0.60 - 8s 48ms/step - loss: 0.6837 - accuracy: 0.6076 - val_loss: 0.6601 - val_accuracy: 0.6042
Trial complete
Trial summary
Hyperparameters:
40/40 [==============================] - ETA: 3:57 - loss: 0.6893 - accuracy: 0.64 - ETA: 1s - loss: 0.6889 - accuracy: 0.6319 - ETA: 1s - loss: 0.6883 - accuracy: 0.62 - ETA: 1s - loss: 0.6872 - accuracy: 0.62 - ETA: 1s - loss: 0.6857 - accuracy: 0.62 - ETA: 1s - loss: 0.6847 - accuracy: 0.62 - ETA: 1s - loss: 0.6836 - accuracy: 0.63 - ETA: 0s - loss: 0.6825 - accuracy: 0.63 - ETA: 0s - loss: 0.6812 - accuracy: 0.63 - ETA: 0s - loss: 0.6798 - accuracy: 0.63 - ETA: 0s - loss: 0.6784 - accuracy: 0.63 - ETA: 0s - loss: 0.6772 - accuracy: 0.63 - ETA: 0s - loss: 0.6762 - accuracy: 0.63 - ETA: 0s - loss: 0.6756 - accuracy: 0.63 - ETA: 0s - loss: 0.6743 - accuracy: 0.63 - ETA: 0s - loss: 0.6736 - accuracy: 0.63 - ETA: 0s - loss: 0.6730 - accuracy: 0.63 - ETA: 0s - loss: 0.6724 - accuracy: 0.63 - ETA: 0s - loss: 0.6717 - accuracy: 0.63 - ETA: 0s - loss: 0.6705 - accuracy: 0.63 - ETA: 0s - loss: 0.6699 - accuracy: 0.63 - ETA: 0s - loss: 0.6692 - accuracy: 0.63 - ETA: 0s - loss: 0.6686 - accuracy: 0.63 - ETA: 0s - loss: 0.6674 - accuracy: 0.63 - 9s 65ms/step - loss: 0.6663 - accuracy: 0.6356 - val_loss: 0.6297 - val_accuracy: 0.6042
40/40 [==============================] - ETA: 3:48 - loss: 0.6960 - accuracy: 0.39 - ETA: 1s - loss: 0.6954 - accuracy: 0.3976 - ETA: 1s - loss: 0.6947 - accuracy: 0.43 - ETA: 1s - loss: 0.6939 - accuracy: 0.45 - ETA: 0s - loss: 0.6931 - accuracy: 0.47 - ETA: 0s - loss: 0.6923 - accuracy: 0.49 - ETA: 0s - loss: 0.6915 - accuracy: 0.50 - ETA: 0s - loss: 0.6907 - accuracy: 0.51 - ETA: 0s - loss: 0.6899 - accuracy: 0.52 - ETA: 0s - loss: 0.6891 - accuracy: 0.53 - ETA: 0s - loss: 0.6883 - accuracy: 0.53 - ETA: 0s - loss: 0.6874 - accuracy: 0.54 - ETA: 0s - loss: 0.6865 - accuracy: 0.54 - ETA: 0s - loss: 0.6855 - accuracy: 0.55 - ETA: 0s - loss: 0.6844 - accuracy: 0.55 - ETA: 0s - loss: 0.6832 - accuracy: 0.56 - ETA: 0s - loss: 0.6820 - accuracy: 0.56 - ETA: 0s - loss: 0.6809 - accuracy: 0.56 - ETA: 0s - loss: 0.6797 - accuracy: 0.57 - ETA: 0s - loss: 0.6786 - accuracy: 0.57 - 8s 53ms/step - loss: 0.6775 - accuracy: 0.5760 - val_loss: 0.6569 - val_accuracy: 0.6042
Trial complete
Trial summary
Hyperparameters:
40/40 [==============================] - ETA: 3:35 - loss: 0.6926 - accuracy: 0.58 - ETA: 0s - loss: 0.6914 - accuracy: 0.6331 - ETA: 0s - loss: 0.6908 - accuracy: 0.63 - ETA: 0s - loss: 0.6902 - accuracy: 0.63 - ETA: 0s - loss: 0.6895 - accuracy: 0.63 - ETA: 0s - loss: 0.6889 - accuracy: 0.64 - ETA: 0s - loss: 0.6883 - accuracy: 0.63 - ETA: 0s - loss: 0.6876 - accuracy: 0.64 - ETA: 0s - loss: 0.6869 - accuracy: 0.64 - ETA: 0s - loss: 0.6862 - accuracy: 0.64 - ETA: 0s - loss: 0.6855 - accuracy: 0.64 - ETA: 0s - loss: 0.6849 - accuracy: 0.64 - ETA: 0s - loss: 0.6842 - accuracy: 0.63 - ETA: 0s - loss: 0.6835 - accuracy: 0.63 - ETA: 0s - loss: 0.6827 - accuracy: 0.63 - ETA: 0s - loss: 0.6818 - accuracy: 0.63 - ETA: 0s - loss: 0.6806 - accuracy: 0.63 - ETA: 0s - loss: 0.6793 - accuracy: 0.63 - 7s 48ms/step - loss: 0.6781 - accuracy: 0.6396 - val_loss: 0.6590 - val_accuracy: 0.6042
40/40 [==============================] - ETA: 3:50 - loss: 0.6932 - accuracy: 0.49 - ETA: 1s - loss: 0.6924 - accuracy: 0.5512 - ETA: 1s - loss: 0.6915 - accuracy: 0.58 - ETA: 0s - loss: 0.6907 - accuracy: 0.59 - ETA: 0s - loss: 0.6900 - accuracy: 0.60 - ETA: 0s - loss: 0.6894 - accuracy: 0.60 - ETA: 0s - loss: 0.6887 - accuracy: 0.61 - ETA: 0s - loss: 0.6879 - accuracy: 0.61 - ETA: 0s - loss: 0.6871 - accuracy: 0.61 - ETA: 0s - loss: 0.6863 - accuracy: 0.62 - ETA: 0s - loss: 0.6851 - accuracy: 0.62 - ETA: 0s - loss: 0.6838 - accuracy: 0.62 - ETA: 0s - loss: 0.6825 - accuracy: 0.62 - ETA: 0s - loss: 0.6813 - accuracy: 0.62 - ETA: 0s - loss: 0.6801 - accuracy: 0.62 - ETA: 0s - loss: 0.6788 - accuracy: 0.62 - ETA: 0s - loss: 0.6776 - accuracy: 0.62 - 8s 49ms/step - loss: 0.6772 - accuracy: 0.6294 - val_loss: 0.6576 - val_accuracy: 0.6042
Trial complete
Trial summary
Hyperparameters:
40/40 [==============================] - ETA: 4:48 - loss: 0.6931 - accuracy: 0.53 - ETA: 1s - loss: 0.6920 - accuracy: 0.5729 - ETA: 1s - loss: 0.6912 - accuracy: 0.58 - ETA: 1s - loss: 0.6903 - accuracy: 0.59 - ETA: 1s - loss: 0.6900 - accuracy: 0.60 - ETA: 1s - loss: 0.6893 - accuracy: 0.60 - ETA: 1s - loss: 0.6885 - accuracy: 0.60 - ETA: 1s - loss: 0.6878 - accuracy: 0.60 - ETA: 0s - loss: 0.6870 - accuracy: 0.60 - ETA: 0s - loss: 0.6862 - accuracy: 0.61 - ETA: 0s - loss: 0.6856 - accuracy: 0.61 - ETA: 0s - loss: 0.6852 - accuracy: 0.61 - ETA: 0s - loss: 0.6845 - accuracy: 0.61 - ETA: 0s - loss: 0.6836 - accuracy: 0.61 - ETA: 0s - loss: 0.6826 - accuracy: 0.61 - ETA: 0s - loss: 0.6817 - accuracy: 0.61 - ETA: 0s - loss: 0.6808 - accuracy: 0.61 - ETA: 0s - loss: 0.6798 - accuracy: 0.61 - ETA: 0s - loss: 0.6793 - accuracy: 0.61 - ETA: 0s - loss: 0.6783 - accuracy: 0.61 - ETA: 0s - loss: 0.6773 - accuracy: 0.61 - ETA: 0s - loss: 0.6761 - accuracy: 0.62 - 10s 64ms/step - loss: 0.6756 - accuracy: 0.6204 - val_loss: 0.6535 - val_accuracy: 0.6042
40/40 [==============================] - ETA: 4:41 - loss: 0.6910 - accuracy: 0.70 - ETA: 0s - loss: 0.6903 - accuracy: 0.6706 - ETA: 0s - loss: 0.6895 - accuracy: 0.65 - ETA: 0s - loss: 0.6883 - accuracy: 0.65 - ETA: 0s - loss: 0.6875 - accuracy: 0.65 - ETA: 0s - loss: 0.6866 - accuracy: 0.65 - ETA: 0s - loss: 0.6859 - accuracy: 0.65 - ETA: 0s - loss: 0.6852 - accuracy: 0.65 - ETA: 0s - loss: 0.6845 - accuracy: 0.64 - ETA: 0s - loss: 0.6838 - accuracy: 0.64 - ETA: 0s - loss: 0.6830 - accuracy: 0.64 - ETA: 0s - loss: 0.6823 - accuracy: 0.64 - ETA: 0s - loss: 0.6815 - accuracy: 0.64 - ETA: 0s - loss: 0.6806 - accuracy: 0.64 - ETA: 0s - loss: 0.6797 - accuracy: 0.64 - ETA: 0s - loss: 0.6788 - accuracy: 0.64 - ETA: 0s - loss: 0.6779 - accuracy: 0.64 - ETA: 0s - loss: 0.6774 - accuracy: 0.64 - ETA: 0s - loss: 0.6765 - accuracy: 0.64 - 9s 49ms/step - loss: 0.6757 - accuracy: 0.6442 - val_loss: 0.6521 - val_accuracy: 0.6042
Trial complete
Trial summary
Hyperparameters:
40/40 [==============================] - ETA: 3:39 - loss: 0.6904 - accuracy: 0.61 - ETA: 1s - loss: 0.6887 - accuracy: 0.6315 - ETA: 1s - loss: 0.6873 - accuracy: 0.63 - ETA: 1s - loss: 0.6861 - accuracy: 0.64 - ETA: 1s - loss: 0.6846 - accuracy: 0.64 - ETA: 1s - loss: 0.6830 - accuracy: 0.64 - ETA: 1s - loss: 0.6815 - accuracy: 0.64 - ETA: 1s - loss: 0.6808 - accuracy: 0.64 - ETA: 1s - loss: 0.6801 - accuracy: 0.64 - ETA: 1s - loss: 0.6795 - accuracy: 0.64 - ETA: 1s - loss: 0.6789 - accuracy: 0.64 - ETA: 0s - loss: 0.6776 - accuracy: 0.64 - ETA: 0s - loss: 0.6769 - accuracy: 0.64 - ETA: 0s - loss: 0.6756 - accuracy: 0.64 - ETA: 0s - loss: 0.6743 - accuracy: 0.64 - ETA: 0s - loss: 0.6732 - accuracy: 0.64 - ETA: 0s - loss: 0.6719 - accuracy: 0.64 - ETA: 0s - loss: 0.6707 - accuracy: 0.64 - ETA: 0s - loss: 0.6693 - accuracy: 0.64 - ETA: 0s - loss: 0.6682 - accuracy: 0.64 - ETA: 0s - loss: 0.6671 - accuracy: 0.64 - ETA: 0s - loss: 0.6659 - accuracy: 0.64 - ETA: 0s - loss: 0.6648 - accuracy: 0.64 - 8s 62ms/step - loss: 0.6642 - accuracy: 0.6424 - val_loss: 0.6224 - val_accuracy: 0.6105
40/40 [==============================] - ETA: 3:36 - loss: 0.6952 - accuracy: 0.40 - ETA: 1s - loss: 0.6943 - accuracy: 0.4440 - ETA: 1s - loss: 0.6931 - accuracy: 0.49 - ETA: 1s - loss: 0.6920 - accuracy: 0.52 - ETA: 1s - loss: 0.6910 - accuracy: 0.54 - ETA: 1s - loss: 0.6900 - accuracy: 0.55 - ETA: 0s - loss: 0.6889 - accuracy: 0.56 - ETA: 0s - loss: 0.6877 - accuracy: 0.57 - ETA: 0s - loss: 0.6865 - accuracy: 0.57 - ETA: 0s - loss: 0.6854 - accuracy: 0.58 - ETA: 0s - loss: 0.6844 - accuracy: 0.58 - ETA: 0s - loss: 0.6833 - accuracy: 0.58 - ETA: 0s - loss: 0.6822 - accuracy: 0.59 - ETA: 0s - loss: 0.6811 - accuracy: 0.59 - ETA: 0s - loss: 0.6798 - accuracy: 0.59 - ETA: 0s - loss: 0.6785 - accuracy: 0.59 - ETA: 0s - loss: 0.6774 - accuracy: 0.60 - ETA: 0s - loss: 0.6762 - accuracy: 0.60 - ETA: 0s - loss: 0.6750 - accuracy: 0.60 - ETA: 0s - loss: 0.6738 - accuracy: 0.60 - 8s 55ms/step - loss: 0.6726 - accuracy: 0.6068 - val_loss: 0.6339 - val_accuracy: 0.6042
Trial complete
Trial summary
Hyperparameters:
40/40 [==============================] - ETA: 3:30 - loss: 0.6947 - accuracy: 0.36 - ETA: 0s - loss: 0.6935 - accuracy: 0.4583 - ETA: 0s - loss: 0.6928 - accuracy: 0.49 - ETA: 0s - loss: 0.6918 - accuracy: 0.52 - ETA: 0s - loss: 0.6908 - accuracy: 0.54 - ETA: 0s - loss: 0.6896 - accuracy: 0.55 - ETA: 0s - loss: 0.6885 - accuracy: 0.56 - ETA: 0s - loss: 0.6874 - accuracy: 0.57 - ETA: 0s - loss: 0.6864 - accuracy: 0.57 - ETA: 0s - loss: 0.6855 - accuracy: 0.58 - ETA: 0s - loss: 0.6845 - accuracy: 0.58 - ETA: 0s - loss: 0.6834 - accuracy: 0.58 - ETA: 0s - loss: 0.6827 - accuracy: 0.59 - ETA: 0s - loss: 0.6820 - accuracy: 0.59 - ETA: 0s - loss: 0.6808 - accuracy: 0.59 - 7s 42ms/step - loss: 0.6804 - accuracy: 0.5965 - val_loss: 0.6432 - val_accuracy: 0.6042
40/40 [==============================] - ETA: 3:30 - loss: 0.6921 - accuracy: 0.60 - ETA: 0s - loss: 0.6916 - accuracy: 0.5959 - ETA: 0s - loss: 0.6906 - accuracy: 0.60 - ETA: 0s - loss: 0.6895 - accuracy: 0.60 - ETA: 0s - loss: 0.6882 - accuracy: 0.60 - ETA: 0s - loss: 0.6870 - accuracy: 0.61 - ETA: 0s - loss: 0.6858 - accuracy: 0.61 - ETA: 0s - loss: 0.6846 - accuracy: 0.61 - ETA: 0s - loss: 0.6832 - accuracy: 0.62 - ETA: 0s - loss: 0.6823 - accuracy: 0.62 - ETA: 0s - loss: 0.6815 - accuracy: 0.62 - ETA: 0s - loss: 0.6803 - accuracy: 0.62 - ETA: 0s - loss: 0.6790 - accuracy: 0.62 - ETA: 0s - loss: 0.6777 - accuracy: 0.62 - 7s 40ms/step - loss: 0.6764 - accuracy: 0.6273 - val_loss: 0.6444 - val_accuracy: 0.6042
Trial complete
Trial summary
Hyperparameters:
40/40 [==============================] - ETA: 3:26 - loss: 0.6947 - accuracy: 0.39 - ETA: 1s - loss: 0.6940 - accuracy: 0.4444 - ETA: 0s - loss: 0.6931 - accuracy: 0.48 - ETA: 0s - loss: 0.6923 - accuracy: 0.50 - ETA: 0s - loss: 0.6914 - accuracy: 0.52 - ETA: 0s - loss: 0.6904 - accuracy: 0.53 - ETA: 0s - loss: 0.6894 - accuracy: 0.54 - ETA: 0s - loss: 0.6885 - accuracy: 0.55 - ETA: 0s - loss: 0.6876 - accuracy: 0.56 - ETA: 0s - loss: 0.6867 - accuracy: 0.56 - ETA: 0s - loss: 0.6858 - accuracy: 0.57 - ETA: 0s - loss: 0.6849 - accuracy: 0.57 - ETA: 0s - loss: 0.6841 - accuracy: 0.57 - ETA: 0s - loss: 0.6833 - accuracy: 0.58 - ETA: 0s - loss: 0.6824 - accuracy: 0.58 - ETA: 0s - loss: 0.6813 - accuracy: 0.58 - ETA: 0s - loss: 0.6801 - accuracy: 0.58 - ETA: 0s - loss: 0.6790 - accuracy: 0.59 - ETA: 0s - loss: 0.6778 - accuracy: 0.59 - ETA: 0s - loss: 0.6767 - accuracy: 0.59 - 7s 48ms/step - loss: 0.6757 - accuracy: 0.5957 - val_loss: 0.6523 - val_accuracy: 0.6042
40/40 [==============================] - ETA: 3:30 - loss: 0.6899 - accuracy: 0.63 - ETA: 1s - loss: 0.6878 - accuracy: 0.6602 - ETA: 0s - loss: 0.6871 - accuracy: 0.65 - ETA: 0s - loss: 0.6863 - accuracy: 0.65 - ETA: 0s - loss: 0.6855 - accuracy: 0.65 - ETA: 0s - loss: 0.6851 - accuracy: 0.64 - ETA: 0s - loss: 0.6845 - accuracy: 0.64 - ETA: 0s - loss: 0.6837 - accuracy: 0.64 - ETA: 0s - loss: 0.6827 - accuracy: 0.64 - ETA: 0s - loss: 0.6816 - accuracy: 0.64 - ETA: 0s - loss: 0.6804 - accuracy: 0.64 - ETA: 0s - loss: 0.6792 - accuracy: 0.64 - ETA: 0s - loss: 0.6778 - accuracy: 0.64 - ETA: 0s - loss: 0.6765 - accuracy: 0.64 - ETA: 0s - loss: 0.6753 - accuracy: 0.64 - ETA: 0s - loss: 0.6741 - accuracy: 0.64 - ETA: 0s - loss: 0.6730 - accuracy: 0.64 - ETA: 0s - loss: 0.6720 - accuracy: 0.64 - ETA: 0s - loss: 0.6710 - accuracy: 0.64 - ETA: 0s - loss: 0.6701 - accuracy: 0.64 - 7s 46ms/step - loss: 0.6692 - accuracy: 0.6415 - val_loss: 0.6421 - val_accuracy: 0.6042
Trial complete
Trial summary
Hyperparameters:
INFO:tensorflow:Oracle triggered exit
CPU times: user 3min 32s, sys: 5.82 s, total: 3min 38s
Wall time: 2min 58s
## Retrieve the best models from the tuner
models = tuner.get_best_models(num_models=2)
plot_model(models[0], show_shapes=True)
## Retrieve the summary of results from the tuner
tuner.results_summary()
Results summary
Trial summary
Hyperparameters:
Trial summary
Hyperparameters:
Trial summary
Hyperparameters:
Trial summary
Hyperparameters:
Trial summary
Hyperparameters:
Trial summary
Hyperparameters:
Trial summary
Hyperparameters:
Trial summary
Hyperparameters:
Trial summary
Hyperparameters:
Trial summary
Hyperparameters:
2.11. Explanation¶
Train Model with the Tuned Hyperparameters¶
EMBEDDING_DIM = 128
HIDDEN_STATE=32
model6 = Sequential()
model6.add(
layers.Embedding(input_dim=vocab_size,
output_dim=EMBEDDING_DIM,
input_length=max_len,
mask_zero=True))
model6.add(
layers.Bidirectional(
layers.LSTM(HIDDEN_STATE,
activation="relu",
name="lstm_layer",
dropout=0.2,
recurrent_dropout=0.5)))
model6.add(layers.Dense(1, activation="sigmoid", name="output"))
model6.compile(loss='binary_crossentropy',
optimizer='adam',
metrics=["accuracy"])
plot_model(model6)
history6 = model6.fit(X_train,
y_train,
batch_size=BATCH_SIZE,
epochs=EPOCHS,
verbose=2,
validation_split=VALIDATION_SPLIT)
Epoch 1/20
40/40 - 7s - loss: 0.6504 - accuracy: 0.6391 - val_loss: 0.6388 - val_accuracy: 0.6042
Epoch 2/20
40/40 - 1s - loss: 0.5663 - accuracy: 0.6780 - val_loss: 0.5177 - val_accuracy: 0.7585
Epoch 3/20
40/40 - 1s - loss: 0.4613 - accuracy: 0.7768 - val_loss: 0.4725 - val_accuracy: 0.7679
Epoch 4/20
40/40 - 1s - loss: 0.4342 - accuracy: 0.7897 - val_loss: 0.4693 - val_accuracy: 0.7703
Epoch 5/20
40/40 - 1s - loss: 0.4270 - accuracy: 0.7980 - val_loss: 0.4610 - val_accuracy: 0.7821
Epoch 6/20
40/40 - 1s - loss: 0.4187 - accuracy: 0.8007 - val_loss: 0.4484 - val_accuracy: 0.7805
Epoch 7/20
40/40 - 1s - loss: 0.4165 - accuracy: 0.8039 - val_loss: 0.4407 - val_accuracy: 0.7868
Epoch 8/20
40/40 - 1s - loss: 0.4094 - accuracy: 0.8086 - val_loss: 0.4378 - val_accuracy: 0.7931
Epoch 9/20
40/40 - 1s - loss: 0.4064 - accuracy: 0.8100 - val_loss: 0.4417 - val_accuracy: 0.7868
Epoch 10/20
40/40 - 1s - loss: 0.3991 - accuracy: 0.8163 - val_loss: 0.4365 - val_accuracy: 0.7876
Epoch 11/20
40/40 - 1s - loss: 0.3970 - accuracy: 0.8194 - val_loss: 0.4438 - val_accuracy: 0.7868
Epoch 12/20
40/40 - 1s - loss: 0.3931 - accuracy: 0.8186 - val_loss: 0.4398 - val_accuracy: 0.7884
Epoch 13/20
40/40 - 1s - loss: 0.3919 - accuracy: 0.8173 - val_loss: 0.4349 - val_accuracy: 0.7813
Epoch 14/20
40/40 - 1s - loss: 0.3911 - accuracy: 0.8163 - val_loss: 0.4368 - val_accuracy: 0.7789
Epoch 15/20
40/40 - 1s - loss: 0.3861 - accuracy: 0.8165 - val_loss: 0.4364 - val_accuracy: 0.7844
Epoch 16/20
40/40 - 1s - loss: 0.3819 - accuracy: 0.8214 - val_loss: 0.4293 - val_accuracy: 0.7907
Epoch 17/20
40/40 - 1s - loss: 0.3822 - accuracy: 0.8177 - val_loss: 0.4364 - val_accuracy: 0.7836
Epoch 18/20
40/40 - 1s - loss: 0.3795 - accuracy: 0.8261 - val_loss: 0.4394 - val_accuracy: 0.7836
Epoch 19/20
40/40 - 1s - loss: 0.3823 - accuracy: 0.8220 - val_loss: 0.4390 - val_accuracy: 0.7836
Epoch 20/20
40/40 - 1s - loss: 0.3755 - accuracy: 0.8259 - val_loss: 0.4432 - val_accuracy: 0.7868
plot2(history6)
explainer = LimeTextExplainer(class_names=['female','male'], char_level=True)
def model_predict_pipeline(text):
_seq = tokenizer.texts_to_sequences(text)
_seq_pad = keras.preprocessing.sequence.pad_sequences(_seq, maxlen=max_len)
return np.array([[float(1-x), float(x)] for x in model6.predict(np.array(_seq_pad))])
#return model6.predict(np.array(_seq_pad))
text_id = 12
print(X_test_texts[text_id])
model_predict_pipeline([X_test_texts[text_id]])
Aurie
array([[0.67101669, 0.32898331]])
exp = explainer.explain_instance(X_test_texts[text_id],
model_predict_pipeline,
num_features=10,
top_labels=1)
exp.show_in_notebook(text=True)
y_test[text_id]
0
exp = explainer.explain_instance('Tim',
model_predict_pipeline,
num_features=10,
top_labels=1)
exp.show_in_notebook(text=True)
exp = explainer.explain_instance('Michaelis',
model_predict_pipeline,
num_features=10,
top_labels=1)
exp.show_in_notebook(text=True)
exp = explainer.explain_instance('Sidney',
model_predict_pipeline,
num_features=10,
top_labels=1)
exp.show_in_notebook(text=True)
exp = explainer.explain_instance('Timber',
model_predict_pipeline,
num_features=10,
top_labels=1)
exp.show_in_notebook(text=True)
exp = explainer.explain_instance('Alvin',
model_predict_pipeline,
num_features=10,
top_labels=1)
exp.show_in_notebook(text=True)
2.12. References¶
Chollet (2017), Ch 3 and Ch 4